Temporarily disable flaky volume test #21865

smarterclayton · 2019-01-25T23:22:28Z

[sig-storage] Volume limits should verify that all nodes have volume limits [Suite:openshift/conformance/parallel] [Suite:k8s]

Introduced in rebase, https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/21860/pull-ci-openshift-origin-master-e2e-aws/2856/

@wongma7 @gnufied fyi

`[sig-storage] Volume limits should verify that all nodes have volume limits [Suite:openshift/conformance/parallel] [Suite:k8s]`

wking · 2019-01-26T00:22:01Z

Is this flaky, or is it just dead? I may have hit this on every run since the rebase ;).

wking · 2019-01-26T00:23:56Z

unit:

FAIL: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kuberuntime TestCreatePodSandbox_RuntimeClass/missing_RuntimeClass 0s
FAIL: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kuberuntime TestCreatePodSandbox_RuntimeClass 100ms

although I don't see how that would be due to f56f493, so

/retest

wking · 2019-01-26T00:24:17Z

/lgtm

for good measure ;).

openshift-ci-robot · 2019-01-26T00:24:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/extended/OWNERS~~ [smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2019-01-26T00:40:45Z

It succeeds extremely rarely. It just so happened that was the two prior runs to the rebase, so we thought it was gravy.

smarterclayton · 2019-01-26T00:45:01Z

/test e2e-aws

wking · 2019-01-26T00:46:21Z

Is it worth kicking until the batch job resolves?

smarterclayton · 2019-01-26T01:01:50Z

I'm waiting for a green e2e-aws and then I'm going to force merge

smarterclayton · 2019-01-26T01:01:56Z

/retest

gnufied · 2019-01-26T01:52:20Z

This requires kublet-1.12 because it is a new feature that depends on volume plugin being registered on the node. So far flakes I have investigated related to this are still using 1.11 kubelet. But I am still looking.

wking · 2019-01-26T02:52:36Z

e2e-aws-builds:

Failing tests:

[Feature:Builds][Slow] openshift pipeline build jenkins-client-plugin tests using the ephemeral template [Suite:openshift]

/test e2e-aws-builds

smarterclayton · 2019-01-26T03:38:46Z

/test e2e-aws

wking · 2019-01-26T03:42:59Z

e2e-aws:

2019/01/26 03:40:18 Ran for 1m15s
error: could not run steps: could not wait for template instance to be ready: could not determine if template instance was ready: failed to create objects: object is being deleted: pods "e2e-aws" already exists

/test e2e-aws

openshift-bot · 2019-01-26T03:45:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

smarterclayton · 2019-01-26T04:01:33Z

/retest

wking · 2019-01-26T05:10:21Z

I'm waiting for a green e2e-aws and then I'm going to force merge

Looking at origin's e2e-aws history, we haven't had anything pass in 11+ hours. So if you don't mind waiting it out while whatever the current batch job is fails, maybe this will just get merged without having to force it. Or maybe Tide will decide it needs to retest it in zounds of possible batch combinations, I dunno ;).

wking · 2019-01-26T05:18:31Z

It's possible this run is dying with:

level=error msg="\t* aws_iam_role.bootstrap: Error creating IAM Role ci-op-2gqyb832-55c01-bootstrap-role: EntityAlreadyExists: Role with name ci-op-2gqyb832-55c01-bootstrap-role already

although I don't know why that would still be running after an hour. You may want to bump your commit timestamp or something to get a fresh, new namespace.

smarterclayton · 2019-01-26T05:20:40Z

I’m more concerned that teardown isn’t running. Is this the installer backgrounding?

smarterclayton · 2019-01-26T05:21:42Z

/test e2e-aws

wking · 2019-01-26T05:29:35Z

No teardown logs from this last run, but here's a normal teardown from a recent installer-PR failure:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_in
staller/1129/pull-ci-openshift-installer-master-e2e-aws/3174/artifacts/e2e-aws/container-logs/tear
down.log.gz | gunzip | tail -n 3
level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:security-group/sg-0b6ccc03f79dbbb5f
" id=sg-0b6ccc03f79dbbb5f
level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:vpc/vpc-03b6fdd1c55f21f19" id=vpc-0
3b6fdd1c55f21f19
level=info msg=Deleted arn="arn:aws:ec2:us-east-1:460538899914:dhcp-options/dopt-0616c23780c3426ec
" id=dopt-0616c23780c3426ec

wking · 2019-01-26T05:36:50Z

My guess is the issue was job 2885 or a similar run. setup failed, but then test was killed:

level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"
2019/01/26 04:08:59 Container setup in pod e2e-aws failed, exit code 1, reason Error
Another process exited
2019/01/26 04:09:14 Container test in pod e2e-aws failed, exit code 1, reason Error
{"component":"entrypoint","level":"error","msg":"Entrypoint received interrupt: terminated","time":"2019-01-26T05:21:56Z"}
2019/01/26 05:21:56 error: Process interrupted with signal interrupt, exiting in 2s ...
2019/01/26 05:21:56 cleanup: Deleting template e2e-aws

My guess is artifact gathering was slow "no cluster", and we reaped teardown before it completed. But I don't seee any teardown logs, so hard to know.

smarterclayton · 2019-01-26T05:38:21Z

I’ve noticed some of that in jobs today.

smarterclayton · 2019-01-26T05:38:36Z

/retest

wking · 2019-01-26T05:40:07Z

We should short-circuit artifact gathering when Terraform fails. Have the installer exit 2? Grep the logs?

wking · 2019-01-26T05:44:58Z

Ah, or gate on some really-basic API call succeeding. I can work that up tomorrow if you don't beat me to it ;).

smarterclayton · 2019-01-26T05:56:06Z

I’ll probably be looking at other things so don’t worry about me also looking at it :)

smarterclayton · 2019-01-26T06:20:25Z

What the fork.

smarterclayton · 2019-01-26T06:23:17Z

One risk is that an api failure flake could result in no logs. We should definitely consider something like queue terminating early if enough sequential things aren’t gathered.

openshift-ci-robot · 2019-01-26T09:13:57Z

@smarterclayton: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws	`f56f493`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

wking · 2019-01-26T14:38:07Z

test/extended/util/test.go

 			`openshift mongodb replication creating from a template`,                                     // flaking on deployment
 			`should use be able to process many pods and reuse local volumes`,                            // https://bugzilla.redhat.com/show_bug.cgi?id=1635893
+
+			`[sig-storage] Volume limits should verify that all nodes have volume limits`, // flaking due to a kubelet issue


I saw this again in a job kicked off after the merge. Maybe the leading [sig-storage] here is a problem? The entries above don't seem to have those.

Yeah, it needed to be regex quoted. Manually check and merged a follow up.

Cross-linking #21867.

Temporarily disable flaky volume test

f56f493

`[sig-storage] Volume limits should verify that all nodes have volume limits [Suite:openshift/conformance/parallel] [Suite:k8s]`

openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 25, 2019

openshift-ci-robot requested review from dcbw and ironcladlou January 25, 2019 23:22

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 25, 2019

smarterclayton added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2019

openshift-ci-robot assigned wking Jan 26, 2019

smarterclayton merged commit 9122b3d into openshift:master Jan 26, 2019

wking mentioned this pull request Jan 26, 2019

Add a cloud credentials checker asset openshift/installer#1100

Merged

wking reviewed Jan 26, 2019

View reviewed changes

Temporarily disable flaky volume test #21865

Temporarily disable flaky volume test #21865

Uh oh!

Conversation

smarterclayton commented Jan 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

openshift-ci-robot commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

gnufied commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

openshift-bot commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

wking commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

smarterclayton commented Jan 26, 2019

Uh oh!

openshift-ci-robot commented Jan 26, 2019

Uh oh!

wking Jan 26, 2019

Choose a reason for hiding this comment

Uh oh!

smarterclayton Jan 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wking Jan 26, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

smarterclayton commented Jan 25, 2019 •

edited

Loading

wking commented Jan 26, 2019 •

edited

Loading

smarterclayton Jan 26, 2019 •

edited

Loading